Llama 3 Overview
Meta has just launched a new series of large language models (LLMs) called Llama 3. This collection includes two versions: one with 8 billion parameters and another with 70 billion parameters. Both models have been pre-trained and fine-tuned to follow instructions.

Key Features of Llama 3
Here are the main technical aspects of Llama 3:

- Architecture: It uses a basic decoder-only transformer model.
- Vocabulary: The model can understand up to 128,000 different tokens (words or word parts).
- Training Length: It is trained on text sequences of up to 8,000 tokens.
- Attention Mechanism: It employs a technique called grouped query attention (GQA).
- Pre-training Data: The model has been pre-trained on over 15 trillion tokens.
- Post-training Methods: After initial training, it undergoes further training using techniques like supervised fine-tuning (SFT), rejection sampling, Proximal Policy Optimization (PPO), and Direct Preference Optimization (DPO).

Performance Highlights
- The 8 billion parameter version of Llama 3 (instruction-tuned) performs better than the Gemma 7B and Mistral 7B Instruct models.
- The 70 billion parameter version of Llama 3 significantly outperforms both the Gemini Pro 1.5 and Claude 3 Sonnet models, though it slightly lags behind Gemini Pro 1.5 on the MATH benchmark.
- Llama 3 models also excel in various benchmarks, such as AGIEval (English), MMLU, and Big-Bench Hard.

Upcoming Developments
Meta has announced plans to release a model with 400 billion parameters, which is still in training and expected to arrive soon. They are also working on features for multimodal support (handling different types of data), multilingual capabilities, and longer context windows. The latest checkpoint results for the Llama 3 400B model, as of April 15, 2024, show promising performance in benchmarks like MMLU and Big-Bench Hard.

For more details on licensing, you can check the model card.